Molecular Systems Biology
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Molecular Systems Biology's content profile, based on 142 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.
Pathak, E.; Tom, R. Z.; Kim, M.; Sachs, S.; Zhang, Y.; Walter, M.; Pfluger, P. T.; Feuchtinger, A.; Dyar, K. A.; Bergman, B. C.; Pleitez, M. A.; Lutter, D.; Hofmann, S. M.
Show abstract
Intermuscular adipose tissue (IMAT) expansion is closely associated with cardiometabolic disease, yet its cellular organization and regulatory mechanisms remain poorly defined. Here, we define a human IMAT gene signature using bulk transcriptomics and identify candidate regulators for IMAT function including adipogenic transcription factor early B-cell factor 2 (EBF2). To determine how these programs are organized in situ, we mapped this signature in a mouse model of diet-induced CMD using spatial transcriptomics. We found that IMAT expansion occurs within discrete stromal niches surrounding muscle fibers, characterized by coordinated activation of adipogenic, extracellular matrix, inflammatory, and metabolic pathways. Spatial analyses showed that fibro-adipogenic progenitor (FAP) abundance does not predict adipocyte formation, supporting a model of localized and context-dependent lineage transitions. Cross-species comparison revealed partial conservation of human IMAT gene programs, validating the mouse model and highlighting species-specific features. Functional experiments in human primary myoblasts showed that EBF2 is sufficient to induce adipogenic reprogramming. Our findings establish IMAT as an active, spatially organized remodeling niche and identify lineage plasticity as a central mechanism driving its expansion in metabolic disease
Soundararajan, V.; Venkatakrishnan, A. J.; Murugadoss, K.; K, P.; Varma, G.; Aman, A.
Show abstract
Semaglutide has shown benefit in metabolic dysfunction-associated steatohepatitis (MASH), but real-world evidence across longitudinal liver phenotypes remains limited, particularly regarding how liver remodeling relates to weight loss and dose exposure. Using a de-identified federated electronic health record network spanning more than 29 million patients in the United States, including 489,785 semaglutide-treated adults, we analyzed 6,734 patients with baseline liver disease burden. We find that higher attained pre-landmark (0-2 years) semaglutide dose was associated with lower post-landmark (2-4 years) risk of steatohepatitis, alcoholic liver disease, and all-cause mortality, whereas greater pre-landmark weight loss was associated with lower post-landmark risk of steatohepatitis, steatotic liver disease, and hepatorenal syndrome, indicating distinct dose- and weight-linked patterns of long-term liver benefits. These associations were notable because semaglutide prescribing was generally lower during the post-landmark period, raising the possibility of durable benefit beyond peak exposure. Towards better understanding mechanistic bases for liver protection, we performed a complementary longitudinal study of 326 adults with paired noninvasive liver elastography measurements before and after treatment initiation. Median liver stiffness decreased from 4.85 [3.02 - 7.20] to 3.9 [2.6 - 5.8] kPa after semaglutide initiation (median change = -0.38 kPa; p<0.001), with 194 of 326 patients (59.5%) showing lower follow-up stiffness. A clinically meaningful reduction of at least 20% was observed in 133 of 326 patients (40.8%), and 69 of 326 (21.2%) shifted to a lower fibrosis stage by prespecified elastography thresholds. Larger improvements were also seen in patients with higher baseline stiffness (p<0.001); notably 80% of patients with cirrhosis-range baseline stiffness ([≥]12.5 kPa) achieved [≥]20% improvement versus 29.5% with minimal baseline disease (p <0.001). The proportion achieving at least 20% stiffness improvement was similar across weight-loss strata, including patients with no weight loss or weight gain and those with at least 10% weight loss (38.0% in each group), and liver stiffness change showed negligible correlation with changes in weight, BMI, HBA1c, alanine aminotransferase, or aspartate aminotransferase. To provide biological context, single cell RNA analyses demonstrated sparse overall hepatic GLP1R expression (0.0239%), with enrichment in non-parenchymal niches including cholangiocytes, intrahepatic cholangiocytes, liver sinusoidal endothelial cells, and hepatic stellate cells implicated in fibrogenesis and vascular remodeling. Together, this real-world evidence suggests diverse liver benefits for semaglutide beyond weight-loss with intricate dose response relationships.
Hayford, C. E.; Baleami, B.; Stauffer, P. E.; Paudel, B. B.; Al'Khafaji, A.; Brock, A.; Quaranta, V.; Tyson, D. R.; Harris, L. A.
Show abstract
Drug-tolerant persisters (DTPs) represent a major obstacle to durable responses in targeted cancer therapy. DTPs are commonly described as distinct single-cell states that survive drug treatment via reversible, non-genetic mechanisms and drive tumor recurrence. Recent work demonstrates that multiple DTPs can coexist, reflecting diversity in lineage, signaling programs, or stress responses. However, each DTP is still generally viewed as a uniform cellular phenotype. Building on our prior work describing a population-level DTP termed "idling" [Paudel et al., Biophys. J. (2018) 114, 1499-1511], here we present evidence supporting a fundamentally different view: that DTPs are not single-cell states, but rather heterogeneous populations composed of multiple sub-states with distinct division and death rates that balance to produce near-zero net population growth. Using single-cell transcriptomics and lineage barcoding, we identify multiple phenotypic states within idling DTP populations, with reduced heterogeneity compared to untreated populations, and find that idling DTP cells emerge from nearly all lineages. Transcriptomic and functional analyses further reveal altered ion-channel activity in idling DTPs, which we confirm experimentally. Moreover, drug-response assays reveal increased susceptibility of idling DTPs to ferroptosis, a non-apoptotic form of regulated cell death, indicating the emergence of vulnerabilities associated with drug tolerance. Altogether, our results support a population-level view of tumor drug tolerance in which DTPs comprise stable collections of phenotypic states, shaped by treatment-defined phenotypic landscapes, which are potentially vulnerable to subsequent interventions. This perspective implies that eradicating DTPs will require a fundamental shift away from cell-type-centric strategies toward sequential treatments that progressively reduce phenotypic heterogeneity by modulating the molecular and cellular processes that establish the DTP landscape, an approach previously termed "targeted landscaping."
Murugadoss, K.; Venkatakrishnan, A.; Soundararajan, V.
Show abstract
GLP-1 receptor agonists induce substantial weight loss, but the extent to which lean tissue and physical function are preserved in routine care remains poorly understood. Using an EHR-linked body-composition digital phenotyping pipeline with LLM-based extraction, we performed a large-scale longitudinal analysis of 670,422 first-episode GLP-1RA users, including 456,742 treated with semaglutide and 213,680 treated with tirzepatide. Among these, 7,965 individuals with paired pre- and post-initiation body-composition measurements were analyzed over 12 months. Tirzepatide was associated with greater relative lean body mass (LBM) loss than semaglutide at each measured time point, with excess LBM losses of 1.1%, 1.5%, 1.3% and 2% at 3, 6, 9 and 12 months, respectively. A Depletive GLP-1 metabotype, defined as >20% total body weight (TBW) loss with >5% LBM loss, was significantly more frequent with tirzepatide than semaglutide during the first year of therapy (10.3% versus 6.7%, p<0.001). By contrast, a Prime GLP-1 metabotype, defined as >10% TBW loss with <5% LBM loss, was numerically more frequent with semaglutide than tirzepatide, but not significantly so (12.3% versus 11.8%, p=0.66). Higher drug dose and longer exposure were associated with progressively greater LBM decline in both treatment groups (both p<0.001). Among 3,746 examined EHR phenotypes, baseline musculoskeletal pain emerged as the most significant correlate of greater LBM loss (BH-adjusted q<0.001): cervicalgia (semaglutide, -4.1 percentage points; tirzepatide, -14.3 percentage points) and knee pain (semaglutide, -4.8 percentage points; tirzepatide, -13.4 percentage points), consistent with mobility-limited patients being more vulnerable to lean-tissue depletion during incretin therapy. Analysis of EHR notes for on-treatment functional features showed reduced exercise tolerance was the strongest correlate of greater LBM loss, increasing by 7.2 and 11.1 percentage points in semaglutide- and tirzepatide-treated patients, respectively. An independent analysis of all available Single-cell RNA-seq data from human musculature showed broader GIPR+ cellular distribution than GLP1R+ cells across immune, stromal, vascular, and contractile compartments, providing plausible biological context for the greater LBM loss observed in routine care with tirzepatide (dual GLP1R-GIPR agonist) relative to semaglutide (GLP1R-specific agonist). In this observational study, greater weight-loss efficacy did not necessarily translate into more favorable body-composition outcomes, underscoring the need for clinical decision-making and trial designs that maximize each patient's likelihood of achieving a Prime GLP-1 metabotype.
Strobl, E. V.
Show abstract
Motivation: Complex disorders arise from multiple genetic mechanisms, but most drug-prioritization methods treat each disorder as a single phenotype and therefore miss locus-specific therapeutic opportunities. Results: We present SIEVE, a framework that decomposes complex disorders into genetically localized subphenotypes and links GWAS summary statistics, reference expression, and perturbational transcriptional profiles to prioritize compounds that target locus-anchored disease mechanisms. SIEVE also constructs genetically calibrated mechanism vectors, projects away nonspecific expression programs using negative anchors, and aggregates evidence across cell lines, doses, and time points to produce robust drug rankings. Across simulations and analyses of real data, SIEVE improves compound prioritization relative to existing methods and shows that subphenotype-aware, genetics-guided modeling can sharpen therapeutic discovery in heterogeneous disorders. Availability and Implementation: R implementation: github.com/ericstrobl/SIEVE.
Fischer, J.; Spindler, M. P.; Britton, G. J.; Weiler, J.; Tankelevich, M.; Dai, D.; Canales-Herrerias, P.; Jha, D.; Rajpal, U.; Mehandru, S.; Faith, J. J.
Show abstract
Our understanding of human mucosal T cell clonotype distribution in health and disease has centered on immunodominant antigens. We performed single cell T cell receptor (TCR) and RNA sequencing as an untargeted approach to define distributions of T cell clonal groups in health and ulcerative colitis (UC) across 333,088 T cells in colon and peripheral blood. Healthy donor-specific TCR repertoires had limited blood-colon clonal sharing, which was highest in cytotoxic T effector memory (Tem) populations and lowest in regulatory T cells (Tregs), reflecting tissue-based compartmentalization. Within healthy colon, TCR repertoires showed high T cell clonal sharing independent of anatomic distance, associated with high intra-clonal phenotypic diversity. Colon cytotoxic and Th17 populations showed high dispersion across sites, while Tregs were compartmentalized. Clonal lineages dispersed across blood and colon upregulated trafficking markers, suggesting active movement between tissues, while those dispersed across colon sites upregulated residency markers, suggesting intra-colon repertoire sharing is mediated by long-term, slow moving clonal groups. In UC, Tregs were expanded across inflamed sites, and increased CD8 Tem clonal groups showed increased dispersion regardless of inflammation. These findings reveal principles of T cell clonal organization in the human colon during health and disease, identifying opposing patterns of clonal dispersion among Treg and Th17 clonal groups, high phenotypic diversity within dispersed clonal groups, and elevated cross-colon dispersion of CD8 Tem clonotypes in UC.
Skotte, N. H.; Cankar, N.; Qvist, F. L.; Frahm, A. S.; Pilely, K.; Svenstrup, K.; Kjaeldgaard, A.-L.; Garred, P.; Petersen, S. W.
Show abstract
Amyotrophic lateral sclerosis (ALS) is a rapidly progressing neurodegenerative disease with a heterogeneous clinical presentation, complicating early diagnosis and therapeutic monitoring. To identify disease-specific biomarkers, we performed an unbiased cerebrospinal fluid (CSF) proteomic analysis in 87 ALS patients, 89 healthy controls, and 61 neurological controls using mass spectrometry. Across all quantified proteins, 399 were significantly dysregulated in ALS, including established neurodegeneration (NEFL, NEFM, UCHL1) and neuroinflammatory (CHIT1, CHI3L1, CHI3L2) markers. Correlation and pathway analyses uncovered dysregulation of immune, synaptic, and metabolic processes, with aberrant complement activation emerging as a hallmark. Complement proteins increased progressively with declining ALS Functional Rating Scale-Revised and longer disease duration, whereas early-stage markers (CLSTN3, CHAD, RELN) indicated pre symptomatic neuronal and synaptic disruptions. Machine learning identified a minimal five protein CSF panel (MB, ITLN1, YWHAG, FCGR3A, PGAM1) that accurately distinguished ALS patients from healthy controls, capturing disease-specific pathophysiology beyond general neurodegeneration. Our findings define a robust ALS-specific CSF proteomic signature, reveal prognostic protein candidates across disease stages, and provide a framework for diagnostic biomarker development, enabling earlier intervention and monitoring.
Cotto, O.; Birgy, A.; Magnan, M.; Bechet, S.; Bonacorsi, S.; Cohen, R.; Levy, C.; Nowrouzian, F. L.; Tenaillon, O.; Blanquart, F.
Show abstract
The worldwide rise in the prevalence of extended-spectrum beta-lactamase (ESBL) producing Escherichia coli is a major public health concern. In Europe, ESBL carriage frequency increased then stabilized at about 6-8 %. Past antibiotic use and travel in countries with high ESBL frequency, notably South-East Asia, have repeatedly been identified as risk factors of ESBL carriage. Yet, the relative contributions of these mechanisms to the observed maintenance of a stable low frequency of ESBL in Europe remains unknown. Here, we used comprehensive data on the risk factors for carriage of ESBL-producing E. coli in the French community, alongside detailed microbiological characterization of both resistant and overall E. coli, to develop a biologically plausible mathematical model of ESBL resistance spread in France. The model also includes several mechanisms previously showed to favor coexistence such as population structure, variability in carriage duration and within-host dynamics. The level of resistance in the community implies resistant strains transmit 14% less than sensitive (95% credible interval 0.6-38%), and are cleared at a +23% larger rate (0.9-62%). ESBL resistance is predicted to be strongly associated with factors prolonging residence in the gut. Both the rate of antibiotic treatment and transmission strongly impact the frequency of ESBL in the community. In contrast, travel has little impact on ESBL frequency. Whether reducing treatment or transmission is best to reduce resistance depends on community-specific parameters. Our study opens perspectives for the quantitative study of resistance evolution and argues for future work to improve the characterization of the duration of carriage of commensal bacterial strains.
Tsiara, I.; Vouzaxaki, E.; Ekström, J.; Rameika, N.; Yang, F.; Jain, A.; Iglesias Alonso, A.; Sjöblom, T.; Globisch, D.
Show abstract
Cancer-related casualties are the most common cause of death worldwide. The discovery of biomarkers is of utmost importance for diagnosis and disease monitoring. Herein, we performed a comprehensive metabolomics biomarker discovery effort in plasma from 615 lung, ovarian and colorectal cancer patients at diagnosis and 95 non-cancerous control subjects. This pan-cancer investigation identified specific panels of metabolites in the entire sample cohort with a high discriminating power and demonstrated by combined ROC AUC values of up to 0.95. The identified metabolites are mainly associated with lipid and amino acid metabolism as well as xenobiotic transformation. These metabolite panels of high predictive power provide new metabolic insights in these cancers and demonstrate the potential of metabolomics for improved diagnosis and monitoring disease progression.
Zhai, T.; Babu, M.; Fuentealba, M.; Al Dajani, S.; Gladyshev, V. N.; Furman, D.; Snyder, M.
Show abstract
Quantitative measures for tracking functional health have generally been lacking. Intrinsic capacity (IC) has been proposed as an appropriate measure, but its metrics have been derived in small datasets and sparse longitudinal data. Using harmonized measures of cognition, locomotion, sensory function, vitality, and psychological well-being from 501,615 UK Biobank participants and followed for a median of 15.5 years, we derived domain-specific and composite IC scores. We examined associations with incident disease, cause-specific mortality, multimorbidity, lifestyle and socioeconomic factors, and multi-omic profiles from Olink proteomics, NMR metabolomics, clinical biochemistry, and blood-cell traits. We found that composite IC declined non-linearly with age, and within-person decline was steeper than the cross-sectional age measures. Participants with greater baseline morbidity, those who subsequently developed incident disease, and those who died earlier in follow-up showed lower IC trajectories across adulthood. The IC domains were only modestly correlated with one another, supporting multidimensionality, yet higher overall IC was associated with lower risk of most diseases examined. The dominant IC domain varied by endpoint, with cognition informative for dementia, sensory function for hearing loss, psychological capacity for depression, locomotion for osteoarthritis, and vitality for cardiometabolic outcomes. IC was also associated cross-sectionally with physical activity, insomnia, smoking, medication burden, and socioeconomic disadvantage. More proteins were found predictive for vitality, and enrichment converged on immune/inflammatory and metabolic pathways. Blood-based surrogates recapitulated part of the phenotypic signal, particularly for vitality. Overall, this IC framework captures longitudinal health trajectories and broad disease vulnerability in a large middle- to older-aged cohort and supports IC as a clinically meaningful, multidomain phenotype of aging and identifies blood-based correlates that may facilitate at-scale future monitoring of aging-related function declines.
Ullman, T.; Krantz, D.; Avenel, C.; Lung, M.; Svedman, F. C.; Holmsten, K.; Ostling, P.; Ullen, A.; Stadler, C.
Show abstract
Effective predictive biomarkers for immune checkpoint inhibitor (ICI) therapy remain an unmet need across solid tumors. Here, we present an integrated spatial proteomics workflow that combines in situ proximity ligation assay with multiplexed immunofluorescence to directly resolve PD1/PDL1 signaling events at the level of defined cellular phenotypes and their spatial organization within intact tumor tissue. Applied as a proof of concept to tumor samples from patients with metastatic urothelial carcinoma treated with pembrolizumab, this approach reveals that PD1/PDL1 interactions specifically involving cytotoxic CD8CD3 T cells are significantly enriched in complete responders, while such interactions are rare in patients with progressive disease. This interaction defined T cell subset achieves superior discrimination of clinical response compared to single marker PDL1 expression or immune cell abundance alone. By integrating direct detection of protein protein interactions with high dimensional single cell phenotyping, our workflow provides a mechanistically informed, spatially resolved biomarker of functional immune engagement. Beyond urothelial carcinoma, this platform establishes a generalizable framework for translating spatial signaling biology into predictive tools for immunotherapy response across tumor types.
Velazquez, D.; Molnar, C.; Reina, J.; Mora, J.; Gonzalez, C.
Show abstract
Ewing sarcoma (EwS) is an aggressive, human-exclusive tumor typically driven by the EWS::FLI1 fusion protein. To assess whether the neomorphic functions of EWS::FLI1 are fundamentally dependent on evolutionarily recent cofactors such as ETS transcription factors (ETS-TFs), Plycomb group (PcG) proteins, CBP/p300, or specific subunits of the BAF complex, we expressed EWS::FLI1 in the model organism Saccharomyces cerevisiae. This minimal system was chosen because several key EWS::FLI 's cofactors possess greatly reduced sequence homology (e.g., BAF) or are lacking altogether (e.g., ETS-TFs, PcG, or CBP/p300). We used co-IP/MS to map the yeast interactome, Chip-Seq to identify gDNA binding sequences, RNA-Seq for global gene expression, and engineered reporters to test conversion of (GGAA) tandem repeats (GGAASat) into neoenhancers. We found that the yeast EWS::FLI1 interactome was more limited and qualitatively distinct from its human counterpart, sharing core machinery (e.g. RNA Polymerase II, FACT) but lacking the BAF/SWI-SNF and spliceosome complexes, and showing strong enrichment for the SAGA chromatin remodeling complex. We also found that EWS::FLI1 binds to hundreds of sites in the yeast genome with a clear preference for putative ETS-TF consensus sequences and (CA) dinucleotide repeats. Yet, EWS::FLI1 expressing cells presented only minimal transcriptional dysregulation, a stark contrast to the extensive changes observed in humans and Drosophila cells. Finally, we found that EWS::FLI1 successfully converted silent GGAASat sequences into active enhancers in yeast. This remarkable result occurs despite the absence of homologs for key human activators, such as CBP/p300, strongly suggesting that EWS::FLI1 can mobilize functionally related, non-homologous pathways to establish neoenhancers at GGAASat sites. Altogether, our results indicate that EWS::FLI1's core ability to drive GGAASat-dependent gene expression is a conserved, ancient property, while GGAASat-independent extensive transcriptome reprogramming is dependent on co-factors and pathways specific to animal cells.
Schwoebel, J.; Frasch, M.; Spalding, A.; Sewell, E.; Englert, P.; Halpert, B.; Overbay, C.; Semenec, I.; Shor, J.
Show abstract
As health systems begin deploying autonomous AI agents that make independent clinical decisions and take direct actions within care workflows, ensuring patient safety and care quality requires governance standards that go beyond existing medical device frameworks designed for human-in-the-loop prediction tools. This paper introduces the Healthcare AI Agents Regulatory Framework (HAARF), a comprehensive verification standard for autonomous AI systems in clinical environments, developed collaboratively with 40+ international experts spanning regulatory authorities, clinical organizations, and AI security specialists. HAARF synthesizes requirements from nine major regulatory frameworks (FDA, EU AI Act, Health Canada, UK MHRA, NIST AI RMF, WHO GI-AI4H, ISO/IEC 42001, OWASP AISVS, IMDRF GMLP) into eight core verification categories comprising 279 specific requirements across three risk-based implementation levels. The framework addresses critical gaps in health system readiness for autonomous AI including: (1) progressive autonomy governance with clinical accountability, (2) tool-use security for agents that independently access EHRs, medical devices, and clinical systems, (3) continuous equity monitoring and bias mitigation across diverse patient populations, and (4) clinical decision traceability preserving human oversight authority. We validate HAARFs enforcement capabilities through a scenario-based red-team evaluation comprising six adversarial scenarios executed under baseline (no middleware) and HAARF- guardrailed conditions (N = 50 trials each, Gemini 2.5 Flash primary with Claude Sonnet 4.6 cross-model validation). In baseline conditions, the agent model executes unauthorized tools in 56-60% of adversarial trials. Under the HAARF condition, deterministic middleware enforcement reduces the unauthorized-tool success rate to 0%, with 0% contraindication misses and 0% policy-injection success (95% Wilson CI [0.00, 0.07]). Cross-model validation confirms identical security metrics, supporting HAARFs model-agnostic design. Mapping analysis demonstrates 48-88% coverage of major regulatory frameworks, with per-category FDA alignment ranging from 73% (C5, Agent Registration) to 91% (C3, Cybersecurity; C7, Bias & Equity). Initial validation with healthcare organizations shows a 40-60% reduction in multi-jurisdictional compliance burden and improved clinical safety governance outcomes. HAARF provides health systems with a practical, risk-stratified pathway for safe AI agent deployment--shifting from reactive compliance to proactive quality governance while maintaining rigorous patient safety standards and human-centered care principles.
Omar, M.; Agbareia, R.; McGreevy, J.; Zebrowski, A.; Ramaswamy, A.; Gorin, M.; Anato, E. M.; Glicksberg, B. S.; Sakhuja, A.; Charney, A.; Klang, E.; Nadkarni, G.
Show abstract
Large language models are increasingly used for clinical guidance while their parent companies introduce advertising. We tested whether pharmaceutical ads embedded in the prompts of 12 models from OpenAI, Anthropic, and Google shift drug recommendations across 258,660 API calls and four experiments probing distinct epistemic conditions. When two drugs were both guideline appropriate, advertising shifted selection of the advertised drug by +12.7 percentage points (P < 0.001), with some model scenario pairs shifting from 0% to 100%. Google models were the most susceptible (+29.8 pp), followed by OpenAI (+10.9 pp), while Anthropic models showed minimal change (+2.0 pp). When the advertised product lacked evidence or was clinically suboptimal, models resisted. This reveals a structured vulnerability: advertising does not override medical knowledge but fills the space where clinical evidence is underdetermined. An open response sub analysis (2,340 calls across three representative models) confirmed that advertising restructures free-text clinical reasoning: models echoed ad claims at 2.7 times the baseline rate while maintaining high stated confidence and rarely disclosing the ad. Susceptibility was provider dependent (Google: +29.8 pp; OpenAI: +10.9 pp; Anthropic: +2.0 pp). Because this bias operates within clinically correct answers, it is invisible to accuracy based evaluation, identifying a class of AI safety vulnerability that standard testing cannot detect.
Timonina, V.; Fellay, J.; the Swiss HIV Cohort Study (SHCS),
Show abstract
Clonal hematopoiesis of indeterminate potential (CHIP) is an age-associated condition linked to chronic inflammation and an increased risk of cardiovascular diseases and hematological malignancies. People with HIV (PWH) exhibit a higher prevalence of CHIP than the general population, but the mechanisms underlying this association remain unclear. In particular, it is unknown whether the excess burden of CHIP reflects earlier emergence of mutant clones, altered clonal expansion dynamics, or differences in selective pressures acting on hematopoietic stem cells. We reconstructed longitudinal trajectories of CHIP variant allele frequency (VAF) in 52 PWH using serial peripheral blood samples spanning up to 25 years from the Swiss HIV Cohort Study. We used spline-based modelling to estimate clone size and growth dynamics, and dynamic time warping to identify common trajectory patterns. Associations between clonal dynamics and longitudinal immune parameters were assessed using linear mixed-effects models. Trajectories in PWH were compared with publicly available longitudinal CHIP data from the SardiNIA population cohort. We identified heterogeneous clonal dynamics consistent with known gene-specific fitness patterns. Larger clone size was associated with lower CD4 T-cell count and lower CD4/CD8 ratio. Compared with the general population cohort, PWH showed higher VAF across the observed age range and steeper early trajectory increases, while long-term expansion rates were broadly similar. Greater variability in clonal dynamics among PWH suggests a stronger contribution of host environmental factors to clonal fitness. These findings support a model in which HIV-associated immune dysregulation alters the hematopoietic fitness landscape, contributing to earlier detectable clonal expansion and increased burden of CHIP in PWH.
Sooknah, M.; Srinivasan, R.; Sankarapandian, S.; Chen, Z.; Xu, J.
Show abstract
Genome-wide association studies (GWAS) have transformed our understanding of human biology, but are constrained by the need for predefined phenotypes. We introduce Vector2Variant (V2V), a general-purpose framework that transforms any set of high-dimensional measurements (such as machine learning embeddings) into a genome-wide scan for associations, without requiring rigid specification of a phenotype. Rather than testing genetic variants against single traits, V2V finds the axis in multivariate space along which carriers and non-carriers maximally differ, and produces a continuous "projection phenotype" that can be interpreted by association with disease labels. The projection phenotypes correlate with orthogonal clinical biomarkers never seen during training, suggesting the learned axes capture biologically meaningful variation. We applied V2V to imaging, timeseries, and omics modalities in the UK Biobank and recovered established biology (like the role of CASP9 in renal failure) without the need for targeted measurements, alongside novel associations including a frameshift variant in LRRIQ1 (potentially protective for cardiovascular disease). V2V is computationally efficient at genome-wide scale, producing summary statistics and disease associations that facilitate target prioritization without the need for phenotype engineering.
Shepherd, F.; Slaney, C.; Jones, H. J.; Dardani, C.; Stergiakouli, E.; Sanderson, E. C. M.; Hamilton, F.; Rosoff, D. B.; Rek, N.; Gaunt, T. R.; Davey Smith, G.; Richardson, T. G.; Khandaker, G. M.
Show abstract
Systemic inflammation is implicated in various diseases, yet its upstream determinants remain poorly examined. We conducted a large scale two-sample Mendelian randomisation (MR) study to systematically evaluate the potential causal effects of 3,213 molecular (metabolomic, proteomic), physiological and disease traits on circulating interleukin-6 (IL-6) and C-reactive protein (CRP) levels. Genetic instruments were derived from genome wide association studies and analysed using inverse variance weighted (IVW), weighted median, and MR-Egger methods with multiple testing correction. Bidirectional MR was performed to assess reverse causation. After Bonferroni correction, evidence of potential causal effects was observed for 72 traits on CRP and 9 traits on IL-6. CRP was predominantly influenced by metabolomic traits, especially lipid and fatty acid measures. Genetically proxied adiposity (body mass index and obesity), triglyceride rich lipoproteins, glycoprotein acetyls (GlycA), and apolipoprotein E increased CRP levels, whereas HDL-related cholesterols, polyunsaturated fatty acids, and glutamine decreased CRP. Most associations were consistent across MR methods, supporting the robustness of these results. As expected, IL-6 had a large effect on CRP. IL-6 was influenced by primarily adiposity and HDL-related lipid measures, with generally smaller effect sizes and limited support across sensitivity analyses. Bidirectional analyses indicated little evidence that CRP directly drives metabolic traits when restricting to cis-acting instruments, whereas genetically proxied IL-6 signalling showed consistent downstream effects on HDL particle concentration and composition. Adiposity is a shared upstream determinant of both inflammatory biomarkers, with stronger and broader effects on CRP. These findings suggest that CRP acts as an integrated downstream readout of systemic inflammatory burden, whereas IL-6 reflects a more tightly regulated and context-dependent process. Our work clarifies traits that may causally influence systemic inflammation and highlights biological pathways linking inflammation to cardiometabolic and inflammatory diseases. By mapping upstream determinants of IL-6 and CRP, we also provide a resource to prioritise key drivers for mechanistic study and therapeutic targeting.
MacGregor, H. A. J.; Blundell, J. R.; Easton, D. F.
Show abstract
Pathogenic variants in TP53, the key tumour-suppressor gene underlying Li-Fraumeni syndrome (LFS), are among the best-established causes of inherited cancer predisposition. However, large-scale sequencing has revealed that many apparently pathogenic TP53 variants detected in blood are the result of somatic clonal expansions, complicating risk interpretation. Using blood-derived whole-exome data from 469,391 UK Biobank participants, we combined variant allele fraction (VAF) with haplotype-sharing analysis to distinguish germline and somatic TP53 variants. Germline variants were concentrated at sites linked to partial loss of p53 function and lower disease penetrance, whereas classic LFS alleles appeared almost entirely somatic. High-VAF carriers of classic LFS alleles conferred markedly increased risk of haematological malignancy but not solid tumours, consistent with large TP53-mutant clonal expansions. The prevalence of somatic clonal expansion also correlated with missense variant pathogenicity, suggesting that somatic activity provides an informative in vivo proxy for functional impact. These results provide new insights into TP53-associated cancer risk at the population level, demonstrate that somatic rather than germline risk predominates in middle-aged healthy adults and provide a scalable framework for variant classification in large-scale population genomics.
Ma, Z.; Qiao, Y.
Show abstract
Background: The enterotype concept proposed that gut microbiomes cluster into discrete types, but subsequent critiques demonstrated that such clustering depends on methodological choices, that the number of clusters is not fixed, and that faecal samples cannot capture spatial heterogeneity along the gastrointestinal tract. The stomach remains particularly understudied, and no systematic classification exists for gastric microbial community types. Methods: We assembled a multi-cohort dataset of 566 gastric mucosal samples spanning healthy controls to gastric cancer, with both Helicobacter pylori (HP)-negative and HP-positive individuals. Critically, we applied the key methodological lessons of the enterotype debate: we used a variational autoencoder (VAE) for dimensionality reduction to learn a continuous latent representation without forcing discrete structure, determined the optimal number of clusters using the Silhouette index (an absolute validation measure) across K=2 to K=10 rather than arbitrarily selecting a cluster number, and performed transparent evaluation of multiple clustering solutions. This VAE-plus-silhouette workflow directly addresses the critiques leveled against the original enterotype analysis. Results: Four gastotypes were identified, with K=4 achieving the highest mean silhouette score, indicating good cluster cohesion and separation. Two gastotypes (Variovorax-type and Trabulsiella-type) were significantly enriched in HP-positive samples, while two gastotypes (Bacteroides-type and Streptococcus-type) were significantly enriched in HP-negative samples. Random Forest and Gradient Boosting achieved excellent baseline performance for predicting HP infection (AUC = 0.990 and 0.993). Conclusions: The VAE-plus-silhouette workflow provides a robust, data-driven approach for identifying gastotypes without forcing discrete structure or arbitrarily fixing cluster numbers. Using this framework, we identified four gastotypes with significantly different HP infection rates. Variovorax-type and Trabulsiella-type showed strong HP-positive enrichment, while Bacteroides-type and Streptococcus-type showed strong HP-negative enrichment. These findings demonstrate that methodological advances from the enterotype controversy can be successfully transferred to the stomach, offering a reproducible taxonomy for stratifying HP infection status with potential clinical utility.
Pore, M.; Balamurugan, K.; Atkinson, A.; Breen, D.; Mallory, P.; Cardamone, A.; McKennett, L.; Newkirk, C.; Sharan, S.; Bocik, W.; Sterneck, E.
Show abstract
Circulating tumor cells (CTCs), and especially CTC-clusters, are linked to poor prognosis and may reveal mechanisms of metastasis and treatment resistance. Therefore, developing unbiased methods for the functional characterization of CTCs in liquid biopsies is an urgent need. Here, we present an evaluation of multiplex imaging mass cytometry (IMC) to analyze CTCs in mice with human xenograft tumors. In a single-step process, IMC uses metal-labeled antibodies to simultaneously detect a large number of proteins/modifications within minimally manipulated small volumes of blood from the tail vein or heart. We used breast cancer cell lines and a patient-derived xenograft (PDX) to assess antibodies for cross-species interpretation. Along with manual verification, HALO-AI-based cell segmentation was used to identify CTCs and quantify markers. Despite some limitations regarding human-specificity, this technology can be used to investigate the effect of genetic and pharmacological interventions on the properties of single and cluster CTCs in tumor-bearing mice.